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In th& paptr a pamilel architecture is devehped to compile the linear convoiutkm of 
two sequences ofarbitraryl^tgths using the Fernmt number trmform(FNT). Jnpartku- 
hr a pipeline structure is desigrsed to compute a 128-point FNT In this FNT^ ordy addi- 
tions and bit rotatkms are required, A standard barrel shifter circuit is modified so tha* it 
performs the required bit rotatkm operatiorL 


The overiq^sare method is generahzed for the FNT to compute a linear convolution 
of arbitrary length, A parallel ^^hitecture u developed to realize this type of overlap-save 
method using orse FNT and several inverse FNTs of 128 points. The generalized overly 
save method idkviates the usual dynamic range limitation in FNTs of u transform 
krtgths Its architecture is regular, simple, and expandable, ami therefore tlly suit- 
able for VLSI mplementatiorL 


I. Introduction 

Fermat number transforms (FNTs) were developed to com- 
pute cyclic convolutions (Refs t -3). A cyclic convolution of 
two sequences can be obtained by taking the inverse FNT of 
the product of the FNTs of these two sequences. 

FNTs over certain transform lengths have the advantage 
over most number theoretic transforms in that no multiplica- 
tions are required. McClelland (Ref. 4) designed a hardware 
systc*.i to realize a 64-point 17-bit FNT that used commer- 
ciall> available ECL 1C chips. For this purpose he developed a 


*This work 'fcas supported tn part by the JPl Director’s Discretionary 
I und. 1 YS2 


new bin:uy number representation and the binary arithmetic 
operations modulo a Fermat number (Refs. 4, $). The Fermat 
number transform can be applied to digital filtering (Refs. 2.3), 
image processing (Refs. 6, 7), X-ray reconstruction (Ref. 8), 
and to the encoding and decoding of certain Reed-Solomon 
codes (Refs. 9, 10). 

In this paper, a parallel architecture is designed to realize a 
digital filter of arbitrary length using the FNT. In Section II, a 
pipeline structure is used to compute a 128-point FNT. Only 
additions and bit rotations are required in this structure. The 
bit rotation operations are implemented by a modification of 
a standard banel shifter circuit (Ref. 11). In Section HI, the 
ovcTlap-save method is generalized to compute the linear con- 
volui.^ii of a digital filtering system. Then a parallel archi- 
tecture is designed to realize the generalized overlap-save 
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method udi^ one FNT and several mveise FNTs of 128 
points. The ctrciiit design of an FNT butterfly m given in the 
Appendix. 

n. A Parallel Slnicluie for Computiiig a 
128-Point FNT 

Let F, - 22^ + 1 bethetdi Fennatauinl>erwheeer>O.F, 
is a prime number for 0 < r < 4. Let (x„) be a JV-point 
sequence of integer numbers, where 0 <x„ <F, - l, 0 <n< 
N - I, and iV is a power of 2. The Fermat number transform 
{Jlf^}of {x„} over is defined as foOows: 

iV-l 

JT^5 2 x„o"*(modF,) . (1) 

w=0 

where 0 < - 1 and a is an ^th root of unity. That is. 

N is the least positive integer such that = 1 (mod F^). The 
cormponding inverse FNT is the following: 

Jk=0 

( 2 ) 

In order that a cyclic convolution can be computed by the 
FNT pair in Eqs. (1) and (2), N depends on the F, and a 
chosen (Refs. 2. 3). More details of an FNT can be found in 
(Refs. 2 and 3). 

In this paper a, and N are selected specifically to be 
Fj - 2^^ + 1, y/2y and 128 respectively. That is, the data of 
this FNT are integers between 0 and 2^^. Hence 33 bits are 
required to represent a number. The tninsform length of this 
FNT is 128. In an FNT over F,, the quantity >/ 2 ^ represents 
the integer 2^'*^ (Refs 2, 3). For r = 5, OTce 2^^ = 

-1 (mod Fj), >/2 = 2^^ - 2® * 2^^ 2^. A conservative value 

of the dynamic range (Ref. 12) is 2**. This value 

is sufficiently large for a number of applications. 

Since the FNT has a mathematical algorithm similar to the 
FFT. an FFT-type structure can be applied to perform a fast 
FNT. Figure 1 shows a pipeline structure (Ref. 13) for com- 
puting a 128-point FNT over Fj. The radix-2 dccimotion-in- 
timc (DIT) technique is used in this structure. The structure 
for performing ar averse FNT is the mirror image of the cir- 
cuit shown m Fi, 1 if the radix 2 decimation-in-frcqucncy 
(DIF) technique is used. 

In Fig. 1 z'f denotes a /-step delay element, which can be 
realized by a set of / first-in-first-out (FIFO) registers. The 


symbols dia^am and operations of a NT FNT butterfly are 
shown in Fig. 2. The dea^ c. Mmn* butter^ is gNan in 
the Appenrhx. A similar DIF FNT butterfly wa$ de^nod in 
Ref. 4. 

InRg. l.Sll^iisashuffle-eiudiangesvdtclioontftdledby the 
ctmtrol signal 5^ for 1 < i < 6 . The operations of theSRf|aie 
shown in Hg. 3. The 5/s can be mi|deiiiented sinqi^ by a 
6 -stage up-counter if no buffer regsters are used in the FNT 
butterflies (Ref. 13). With the buffer registeis hi fl» butte- 
flies^ <^y etements are needed at the outfnits of the counter, 
as ^own in F%. 4, for the (uirpose of synciiromzatioii. 

In the next section the overkp^ve method (Ref. 13) is 
generalized to imidement a digital filter of ariuOaiy tengfli 
using FNT and several tnv;:fse FNTs of 128 pohits over 
Fj. Then a paraltel VLSI architecture is designed to realize this 
overtap-save method using the FNT structure designed above. 

m. A Digital FHterArchltectiire of 
Length Using the FNT 

In the previous sectimi F^, a, and N are chosen to be F^, 
>/ 2 , and 128 respectively. N - 128 is the maximum transform 
length over F 5 (Refs. 2, 3), and 2^^ is the dynamic range. One 
could incre^ the transform length by dioosing F^ for r> 6 . 
In so doing, however, at least 2^ 1 - 6 S bits are required to 

represent a number. Alternatively, one could use a specific a, 
where a is not a power of y/2, over F 3 or F 4 to increase the 
transform length, in such a case a complete multi{rticattcHi is 
required. In addition, the dynamic range is used up readily. To 
remedy this difficulty, the overlap-save method is generalized 
to compute the lineal convolution of a digital filter of arbi- 
trary input data and filter lengths. A parallel architecture is 
developed to realize this generalized overlap-save method uang 
the 128-point FNT structure designed in the previous section. 

Let {x^} and be the input and filter sequences of a 
digital filter, respectively, where 0 < n < /V • 1 and 0 < m < 
M- \ . The output sequence iv^} of the filter is the linear con- 
volution of {x^}and {h^},whcreO<^<7V + /l#- 1 (Ref.13). 
It is shown (Ref. 13) that such a linear convolution can be ob- 
tained by computing a cyclic convolution. For purposes of 
exposition it is assumed that V * 1024 and M * 256 in the 
following argument. 

In order to use 1 28-point FNTs to jompute }, four 
128-point subfilters {hj,}, and are formed 
by partitioning (A^) as follows: 

0<m<63 
” i 

(0 tor64<m<127 (3) 
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for 1 < I < 4. Next the ovedap^m method (Ref. 13) is used 
to compute the linear convolutioa 0^} of and {^}by 
using the cyclic convohitioo technique^ where 1 < f < 4 and 
0<k< 1087. To accomplish this is secticmed into 128- 
point subsequences with 64 points of {x^} overlapped be- 
tween two consecutive subsequences. That Is {x^} is aectumed 
into {*»,} - {xj,} - {x^}, . . . . tx**}- 

when 0 < N < 1023 and 0 < n < 127. Then Oi). fra 1 < 
I < 4, is computed by overlapping the cyclic convolution of 
and for 1 </< 15 using 128-point FNTS. Finally 
the output sequence for 0 < Jb < 1024 -f 256 -1 « 1279, 

results evidently from 0^} for 1 < i < 4 by the foOowing 
equation: 

yit "yl^yl ^yl * yt 
’ *yl * (j'e" *yt *■*” 

The relationship between and {kI) for 1 < i < 4 is illus- 
trated in Fig. 5 Other cases of the generalized overlap-save 
method are constructed in a similar manner. 

In Fig. 6 is shown the block diagram of an architecture for 
the geitf ralized overlap-save method of a digital filter using cme 
FNT and four inverse FNTs of 128 points. In this system the 
DIT and DIF techniques are used for the FNT and inverse 
FNTs, respecdvely. In the ^neralized overlap-save method, 
one of the two outputs of the inverse FNT butterfly in the last 
stage is not needed. Hence, the inverse FNT butterfly in the 


last stage b a deg en erative butterfly ctoitt, nd die defay 
ekmants associated udth dds butteiSy drcuit m not aoodad. 
1heM*s in 6 are die FNTt of (A|^}.The(l/iV)fi»torbi 
Eq. (2) b hioorporated into tte These can be 
precomputed and stored hi the system. The dMm in Fig. 6 
perform nonnd hkiaiy addidofis, not aMttioiis nmdidoF^ 

The advantage of the generaliaed overlap-save method for 
iroplementtng a digitd filter using FNT tnnsfonns m the 
following: (l)It lequiies no multi|rikations. Only addidoas 
a. i bit rotadmis are needed. (2) It a&viates the usual dy- 
namic range limitation for long sequence FNTs. (3) It utfliaee 
the FNT and inverse FNT circuits 100% of the time. (4) The 
lengths of the input data and fflter sequences can be arbitrary 
and different. 


IV. Conclusion 

A pipeline structure b developed to cornfmie a 128-point 
Fermat number transform. In thb 128-point FNT, only addi- 
d<His and bit rotations are required. A barrel shifter dicuit b 
modified to perform the multiplication of an integer by a 
power of 2 modulo a Fermat number. The overlap-save 
method b generalized to ccMnpute the hneai convolution of a 
digital filter with arbitrary input data and fSter lengths. An 
architecture b developed to realize thb generalized overfap- 
save method by a simpk combination of me 128-point FNT 
and several inverse FNT structures. Thb realization alleviates 
the dynamic range hmitaUmis of the FNT with a long trans- 
form length. The architecture b simple and regular, and hence 
suitable for VLSI implementation. 
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Flg.3. AshufllMxctiaiig*si»ttctt.(a)Dlf»ctconiiectiofi. 
(b) Crossed connection 



5 , *2 *3 *4 *5 *6 

FI9.4. A 6»stage up-counter used to genente the control signals $/s in fig. 1 
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Appendix 


In this appendix a circuit is do^signed to implement a DIT 
FNT butterfly shown in Fig. 2. A similar DIF FNT butterfly 
was designed in Ref. 4. To efficiently perform the FNT, num- 
ber representations have been proposed (Refs. 4, 5) for binary 
arithmetic operations modulo F,. The diminished- 1 represen- 
tation proposed by Liebowitz (Ref. 5) is used in the following 
design. Let A be represented by [^32 ^31 ... Jq), where 
0 < i4 < 2^^ and is the fth bit of A, Table A-1 shows the 
correspondence between decimal numbers in a normal binary 
representation and their values in the diminished- 1 representa- 
tion. The most signiflcant bit (MSB) 0^2 viewed as the 
zero-detection bit in the diminished-1 representation. 

Two basic binary arithmetic operations modulo with 
a = \/2 are addition and multiplication by a power of 2. Other 
operations can be expressed in terms of these two operations. 
In the following, some details of these operations are described 
briefly. More specifics can be found in Ref. 5. 

(1) Addition: Let S +R. lf>4 =0, then5=^5. If^ = 0, 

then S = A. If neither A nor B equals 0, add [a3| 
fljo . . . Aj a^] and [b^i ^>30 • • • 

plement the carry and add it to the previous sum. This 
yields 5. 

(2) Multiplication by a power of 2: Let ^ = i4 • 2^. If 
w4 = 0, then B = 0. If ^4 # 0, left rotate [a3j . . . 

Aq) Cbit positions, but complement the v^ue of bit 
31 when it is rotated to bit position 0, and set b ^2 ~ 


(3) Negation: Since 2^^ s -i (mod -^4 » 4 • 2^*. 

Hence if i4 ^ 0, » [ 2^2 ^31 ^30 - * ^1 ^ol ^ 

denotes the complement of a^,U A ^ 0, then •A » 0. 

(4) Multiplication by >/2: Since \/2 = 2^^ + 2^,^4 *>/2® 
A • 2^^ ^A ^ 2^®. 


(5) Multiplication by a power of >/2: Let B *4 * (>/2)^- 
If C is even, then B = A • (2)^^. If C is odd, then 
B^(A >/2)-2<^“»)/^. 

In Fig. A-1 is shown a block diagram of an FNT butterfly 
shown in Fig. 2. In this design, 4, S, D, and E are 33-bit data, 
and C is the 7-bit exponent nk in Eq. (1). Two realizations of 
an FNT adder can be found in Ref. 4. Figure A-2 shows a pass- 
transistor full-adder, which requires less silicon area. The mul- 
tiplier in Fig. A-1 is used to multiply a number by a power of 
2 modulo F5 . Figure A-3 shows a block diagram of this multi- 
plier. The shifter in Fig. A-3 is a modification of a barrel 
shifter (Ref. 1 1) for performing bit rotation operations. 

For purposes of illustration, consider the simple FNT over 
Fq = 2 't* 1. In such an FNT butterfly the functional table and 
circuit of a modified barrel shifter are shown in Fig. A4, 
where the inputs are [b^ and [$3 $2 S| Sq], and the out- 
puts are [6| . 
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Table A>1. The cornMpondtnce among decimal numbers, thalrvakiM^ the normal blnafyraprssafit«tion« and in ttie 

dimlnlaliacki repieaantation 


Normal binary representation Diininished-l representation 


number 


®31 

So 


s 

s 

s 

Sa 

"31 

^30 


"2 

"l 

"o 

0 

0 

0 

0 

. . . 

0 

0 

0 

1 

0 

0 

... 

0 

0 

0 

1 

0 

0 

0 

. . . 

0 

0 

1 

0 

0 

0 

. . . 

0 

0 

0 

2 

0 

0 

0 


0 

1 

0 

0 

0 

0 

• ’ • 

0 

0 

1 

2^^-2 

0 

1 

1 


1 

1 

0 

0 

1 

1 


1 

0 

1 

2’^-l 

0 

1 

1 

. . . 

1 

1 

1 

0 

1 

1 


i 

1 

0 

232 

1 

0 

0 


0 

0 

0 

0 

1 

1 

... 

1 

1 

1 
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